

# Pulse Detector HDL Workflow Tutorial R2024a - R2024b

MathWorks Application Engineering



### **Example Overview**

In this tutorial, you are provided the MATLAB® reference of a pulse detection algorithm. The algorithm detects a known waveform in a received signal using a matched filter, and finding the resulting peak. It is a commonly used technique in radar or wireless communication systems.

#### MATLAB golden reference

```
% Create matched filter coefficients
CorrFilter = conj(flip(pulse))/PulseLen;

% Correlate Rx signal against matched filter
FilterOut = filter(CorrFilter,1,RxSignal);

% Find peak magnitude & location
[peak, location] = max(abs(FilterOut));

% Print results
figure(1)
subplot(311); plot(real(TxSignal)); title('Tx Signal (real)');
subplot(312); plot(real(RxSignal)); title('Rx Signal (real)');

t = 1:length(FilterOut);
str = sprintf('Peak found at %d with a value of %.3d',location,peak);
subplot(313); plot(t,abs(FilterOut),location,peak,'o'); title(str);
```





### **Example Overview**

This tutorial will guide you through the steps necessary to implement the algorithm in FPGA hardware, including:

- Create a Simulink® model for the algorithm
- Implement the hardware architecture
- Convert the design to fixed-point
- Generate and synthesize the HDL code

```
% Create matched filter coefficients
CorrFilter = conj(flip(pulse))/PulseLen;
% Correlate Rx signal against matched filter
FilterOut = filter(CorrFilter,1,RxSignal);
% Find peak magnitude & location
[peak, location] = max(abs(FilterOut));
```



### Software Requirements

- You will need the following MathWorks products:
  - MATLAB (R2024a or R2024b)
  - Simulink
  - Fixed-Point Designer<sup>TM</sup>
  - MATLAB Coder<sup>TM</sup>
  - − HDL Coder<sup>TM</sup>
  - Signal Processing Toolbox<sup>TM</sup>
  - DSP System Toolbox<sup>TM</sup>
  - − DSP HDL Toolbox<sup>TM</sup>
- AMD Vivado<sup>TM</sup> 2023.1 is used for the last step (step 4)
  - Other Vivado versions typically work, but synthesis results may be different.



### Preparation

- You should be familiar with MATLAB and have basic knowledge of Simulink such as:
  - Creating Simulink models (adding & connecting blocks, creating subsystem, etc)
  - Setting block and model-level parameters
  - Updating, simulating model and viewing simulation results
  - Interacting with MATLAB using workspace variables and signal logging
  - Using fixed-point data type in Simulink
- Use the following resources as needed to prepare for this tutorial:
  - MATLAB Onramp
  - Simulink Onramp
  - HDL Coder Tutorial Overview (video series)
  - HDL Coder Evaluation Reference Guide
- Locate the example files in the folder: /pulse\_detector/work



# Step 1: Streaming Simulink model

### In this step, you will:

- Create a Simulink model with streaming input
- Implement a hardware-friendly peak finder
- Compare the Simulink pulse detector to the MATLAB golden reference





# Step 1.1: Stream and filter input signal

- 1. Run **pulse\_detector\_reference.mlx** to initialize parameters needed in this step.
- Create a new Simulink model and name it pulse\_detector\_v1.slx
- Add a **Signal From Workspace** block, and enter RxSignal for Signal on the block dialog. This will stream the vector RxSignal one sample at a time.
- Implement the filter function using a Discrete FIR Filter block from the Simulink library. Enter CorrFilter for Coefficients.
- 5. Name the filter block output signal filter\_out, and enable data logging for the signal.
- Enter SimTime for simulation stop time.



Tip: Make sure to choose the right blocks



### Step 1.1: Stream and filter input signal

- 7. Compare the Simulink filter output against the MATLAB reference. Verifying your Simulink model incrementally helps catch mistakes early on.
  - Run the test bench script pulse\_detector\_v1\_tb.mlx. An error is expected at line 16 as the model is incomplete.
  - Verify the Simulink filter output matches the MATLAB reference (max error ~ 1e<sup>-16</sup>).

#### Simulate model and compare results to reference if iscolumn(CorrFilter) 2 CorrFilter = transpose(CorrFilter); % need row vector for filter block 4 end SimTime = length(RxSignal) + WindowLen; 5 6 % Simulate model slout = sim('pulse\_detector\_v1'); 9 % Correlation filter output 10 FilterOutSL = getLogged(slout, 'filter\_out'); 11 compareData(real(FilterOut),real(FilterOutSL),{2 3 1},'ML vs SL correlator output 12 compareData(imag(FilterOut),imag(FilterOutSL),{2 3 2},'ML vs SL correlator output 13 14 % Magnitude squared output 15 $\mathbf{A} \mathbf{\Theta}$ MagSqSL = getLogged(slout, 'mag sq out'); 16 compareData(MagSqOut,MagSqSL,{2 3 3},'ML vs SL mag-squared output'); 17 18 19 % Peak value MidSampleSL = getLogged(slout, 'mid sample');

```
Maximum error for ML vs SL correlator output (re) out of 5000 values
1.665335e-16 (absolute), 6.301458e-14 (percentage)
Maximum error for ML vs SL correlator output (im) out of 5000 values
 8.326673e-17 (absolute), 7.535410e-14 (percentage)
                     ML vs SL correlator output (re) max error = 1.665e-16
                                                                            Reference
               is randomized
                       والمرابع المرور والمور والطواريل ومناه والمرو بالطور والمؤرد المارية بالأربية الأربية بالرام المرابية المرورة
                                      2000
                                              2500
                    ML vs SL correlator output (im) max error = 8.327e-17
                                                                            Reference
Warning: Did not find any Dataset element using 'mag sq out'.
Error using getLogged
Signal 'mag sq out' not found. Make sure it is logged and named correctly.
```



# Step 1.2: Hardware-friendly peak finder (magnitude-squared)

- 1. Review the section *Hardware friendly implementation of peak finder* in **pulse\_detector\_reference.mlx**.
- 2. At the filter block output, implement magnitude-squared using the following blocks:



3. Group the blocks in a subsystem and name it Compute Power. Log the subsystem output as mag\_sq\_out.







# Step 1.3: Hardware-friendly peak finder (local peak)

- Connect mag\_sq\_out to a Tapped Delay block for the sliding window buffer. Set Number of delays to WindowLen.
- Implement the rest of the hardware-friendly peak finder using a MATLAB Function block. Copy and paste the code from pulse\_detector\_reference.mlx (line 42 to the end) to the block, and modify it to match the screenshot.
- 3. Click on **Edit Data** to open the **Symbols** pane and the **Property Inspector**. Select WindowLen, change its Scope to Parameter and uncheck Tunable.





# Step 1.3: Hardware-friendly peak finder (local peak)

- 4. Navigate back to the top-level model. Add a Constant block and enter threshold for Constant value.
- 5. Connect the constant, tapped delay and MATLAB function blocks as shown. Group the 3 blocks in a subsystem named Local Peak.
- Connect Local Peak to a Unit Delay Enabled Synchronous (UDES) block; log and name the connecting signals mid\_sample and detected as shown.
- 7. Finally, connect the UDES block to a **Display** block. The UDES block will keep the detected peak value on display at the end of the simulation.
- Save the model.







### Step 1.4: Compare model to MATLAB reference

Run **pulse\_detector\_v1\_tb.mlx** to simulate the model, and compare the Simulink outputs to the MATLAB reference.

- Maximum error for the correlator, magnitude-squared and peak value should be in the range of floating-point eps (e<sup>-16</sup>).
- Peak location is randomized for each run.
- The Simulink model implements a detected output instead of an index for the peak location, as the algorithm is often used to determine the beginning of a data frame, where a detected signal is sufficient.

```
Maximum error for ML vs SL correlator output (re) out of 5000 values
8.326673e-17 (absolute), 2.648455e-14 (percentage)

Maximum error for ML vs SL correlator output (im) out of 5000 values
9.714451e-17 (absolute), 8.099696e-14 (percentage)

Maximum error for ML vs SL mag-squared output out of 5000 values
2.775558e-17 (absolute), 2.772557e-14 (percentage)
```



Peak location = 2566, magnitude = 3.164e-01 using global max

Peak location = 2566, mag-squared = 1.001e-01 using local max

Peak mag-squared from Simulink = 1.001e-01, error = 2.776e-17



### Step 2: Hardware micro architecture

### In this step, you will:

- Prepare the model for HDL code generation
- Add data valid control signal
- Use hardware-efficient blocks and pipeline the data path
- Compare the Simulink architecture model to the MATLAB golden reference





### Step 2.1: HDL model configurations

- Save the Simulink model as pulse\_detector\_v2.slx.
- Run the MATLAB command hdlsetup('pulse\_detector\_v2'). This will configure the model settings to be compatible with HDL code generation.
- 3. Group the filter block, the *Compute Power* and *Local Peak* subsystems into a top-level subsystem named Pulse Detector. This top-level subsystem will be referred to as the **DUT** (Design Under Test) i.e. the portion of the model that will generate HDL.

Note: **hdlsetup** enables sample time color display for the model. The next time you update/simulate the model, blocks will appear red as they operate at the fastest sample rate.

```
Command Window

>> hdlsetup('pulse_detector_v2')
### AlgebraicLoopMsg value is set from 'warning' to 'error' (revert).
### BlockReduction value is set from 'on' to 'off' (revert).
### ConditionallyExecuteInputs value is set from 'on' to 'off' (revert).
### DefaultParameterBehavior value is set from 'Tunable' to 'Inlined' (revert).
### InheritOutputTypeSmallerThanSingle value is set from 'off' to 'on' (revert).
### ProdHWDeviceType value is set from 'Intel->x86-64 (Windows64)' to 'ASIC/FPGA-:
### SingleTaskRateTransMsg value is set from 'none' to 'error' (revert).
### Solver value is set from 'VariableStepAuto' to 'FixedStepDiscrete' (revert).
### The listed configuration parameter values are modified as a part of hdlsetup.
```





### Step 2.2: Implement data valid interface

- 1. Add a data valid input using a **Signal From Workspace** block; enter true(size(RxSignal)) for the Signal parameter. This enables the design to interface with a non-continuous data source, although the example uses continuous input data for simplicity.
- Connect the block to the DUT subsystem as a second input.
- Inside the DUT subsystem, name the first input port data\_in, and the new input port valid\_in.
- Add a new Output port and name it valid\_out.







### Step 2.3: HDL optimized FIR filter

- Replace the current FIR filter with a **Discrete FIR Filter** block from the **DSP HDL Toolbox** library. This block offers systolic filter architectures and pipeline register placements that are designed to use FPGA DSP resources efficiently. It also provides control signals for common data interfaces.
- 2. Again, enter CorrFilter for Coefficients, name the data output signal filter\_out, and log the signal.
- 3. Connect the valid\_in port to the FIR valid input, and the FIR valid output to the valid\_out port. Log the FIR valid out as filter\_valid.





Figure 1-1: Basic DSP48E1 Slice Functionality



# Step 2.4: Insert pipeline registers

- Pipeline the data input & output using **Delay** blocks as shown. Add matching delays to the valid signal path.
- 2. The FIR filter from DSP HDL Toolbox already contains pipeline registers. No action is required.





### Step 2.4: Insert pipeline registers

- Open the HDL Code Toolstrip under Apps > Code Generation > HDL Coder.
- 4. Select the Compute Power subsystem and open its HDL Property Inspector / HDL Block Properties dialog using the HDL Code Toolstrip. Set AdaptivePipelining to on, then close the HDL properties dialog.





#### R2024a





# Step 2.5: Compare architecture model to MATLAB reference

- 1. In the top-level model, terminate the DUT output port *valid\_out* by connecting it to a **Terminator** block. Save the model.
- Run pulse\_detector\_v2\_tb.mlx to simulate the model, and compare the Simulink outputs to the MATLAB reference.
  - This test bench uses the new filter\_valid signal to qualify the logged filter & magnitude-squared outputs:

```
% Correlation filter output
FilterOutSL = getLogged(slout, 'filter_out');
FilterValid = getLogged(slout, 'filter_valid');
FilterOutSL = FilterOutSL(FilterValid);
```

 Maximum error for all outputs should be similar to that of step 1.





Question: What's wrong with the valid signal implementation?

Answer: The valid signal should also be connected to the sliding window buffer, to prevent invalid samples from entering the delay line. The simulation is only correct in this example because valid\_in is always high.



### Step 3: Fixed-point conversion

### In this step, you will:

- Convert the model to fixed-point
- Compare the Simulink fixed-point model to the MATLAB golden reference





### Step 3.1: Define input and filter fixed-point data types

- Save the Simulink model as pulse\_detector\_v3.slx.
- 2. At the top-level model, add a **Data Type Conversion** block at the output of *RxSignal*. Set Output data type to DT\_input, which is defined as fixdt(1,16,14) in the test bench script to fully represent -1 to 1.
- Inside the DUT, set Coefficients Data Type on the FIR filter block to DT\_coeff, which is defined as fixdt(1,18). Fraction length of a constant can be automatically determined when it is left unspecified.
- 4. The filter is set up to perform multiply & add using full-precision fixed-point. Run pulse\_detector\_v3\_tb.mlx to update the filter output data type (error is expected as the model is partially converted to fixed-point).





Update

Model ▼

DEBUG

DIAGNOSTICS

Information Overlays ▼ MODELING

**FORMAT** 

Pause Time (sec)

🏪 Add Breakpoint 🔻

Breakpoints List

Compile model to check for static errors (Ctrl+D

### Step 3.2: Define magnitude-squared data types

- In the *Compute Power* subsystem, add a **Data Type Conversion** block after the input port; set Output data type to DT\_filter. This reduces the data word length to 18-bit while maintaining the integer range.
- 2. Update the model to examine the data types through the subsystem (error is still okay).
- 3. Reduce the final adder output to DT\_power using another Data Type Conversion block, and update the model once more (error is still okay).



Tip: 18-bit inputs and full-precision fixed-point allow multiply-add operations (including those within the FIR filter in the previous step) to be implemented using DSP blocks in most FPGAs.



### Step 3.3: Define peak picker data types

- In the *Local Peak* subsystem, add a **Data Type Duplicate** block. Connect one port to the threshold constant block output, and another to the tapped delay block output.
- Set Output data type of the threshold constant to Inherit via back propagation.

These 2 steps allow the threshold constant to take on the same data type used by the input data.

 In the MATLAB function block, change the detected value to true/false.

```
if all(CompareOut <= 0) && (MidSample > threshold)
detected = true;
else
detected = false;
end
```

4. Update and save the model. It should be error-free at this point.



Parameter precision loss occurred for 'Value' of 'pulse\_detector\_v3/Pulse\_Detector/Local\_Peak/Constant'. The original value of the parameter, 0.03, cannot be represented exactly using the run-time data type sfix18\_En11. The value is quantized to 0.02978515625. Quantization error occurred with an absolute difference of 0.0002148437499999989 and a relative difference of 0.71614583333333%.

#### **▼**Suggested Actions

- To disable this warning or error for all parameters in the model, set the 'Detect precision loss' option to 'none'.
- · Inspect details in the Parameter Quantization Advisor.

Component: Simulink | Category: Block warning This warning message can be safely ignored or suppressed.

Apply



### Step 3.4: Compare fixed-point model to MATLAB reference

Run **pulse\_detector\_v3\_tb.mlx** to simulate the model, and compare the Simulink fixed-point outputs to the MATLAB floating-point reference.

- Note the increase in error due to quantization.
- You may switch between fixed-point and floatingpoint using the following flag in the test bench.

```
% Simulate model in fixed-point or floating-point
       fxpt mode = ✓;
       if fxpt mode % fixed-point
           DT input = fixdt(1,16,14);
           DT filter = fixdt(1,18,15);
           DT power = fixdt(1,18,11);
       else % floating-point
           DT input = 'double';
10
           DT filter = 'double';
11
           DT power = 'double';
12
       end
       DT coeff = fixdt(1,18); % coeff is treated as double
13
```

```
Maximum error for ML vs SL correlator output (re) out of 5000 values 8.713091e-06 (absolute), 4.352322e-03 (percentage)

Maximum error for ML vs SL correlator output (im) out of 5000 values 8.120594e-06 (absolute), 8.318296e-03 (percentage)

Maximum error for ML vs SL mag-squared output out of 5000 values 4.907380e-04 (absolute), 1.224467e+00 (percentage)
```

```
Peak location = 1268, magnitude = 2.002e-01 using global max

Peak location = 1268, mag-squared = 4.008e-02 using local max

Peak mag-squared from Simulink = 4.004e-02, error = 3.862e-05
```

Tip: Doing hardware architecture design before fixed-point conversion prevents quantization noise from obscuring design errors.



### Step 4: HDL code generation & synthesis

### In this step, you will:

- Check the model for HDL compatibility
- Generate HDL code and reports
- Synthesize the generated design using AMD Vivado







### Step 4.1: Check model for HDL compatibility

- Save the Simulink model as pulse\_detector\_v4.slx.
- 2. Set the *Pulse Detector* subsystem as DUT using the HDL Code Toolstrip (click **!** to unlock; select the subsystem in the model; click **|** again to lock).
- 3. Check HDL Compatibility for the DUT subsystem using **HDL Code Advisor**. Besides incompatibility, the tool also checks for settings that may result in inefficient hardware. In many cases, a shortcut is provided to modify those settings.
- 4. De-select the groups Industry standard checks and Native Floating Point checks – they are not applicable for this example. Highlight the top folder named HDL Code Advisor, then click Run Selected Checks.







### Step 4.1: Check model for HDL compatibility

- 5. HDL Code Advisor identifies 3 warnings that are explained below. For this exercise, you may ignore the warnings and close the UI.
  - Check for MATLAB Function block settings reports the use of saturation and rounding logic, which consumes extra hardware resources. If your design works with floor rounding and wrap on overflow, you may click Modify Settings to apply them automatically.
  - Check for infinite and continuous sample time sources reports the use of *inf* sample time in the model, which may inhibit some optimization features. If that is the case, you may click Modify Settings to replace all such instances with back propagation (-1).
- Check for model parameters suited for HDL code generation recommends Simulink diagnostics settings that may help identify code generation issues, such as warnings for unconnected signals.





### Step 4.2: Generate HDL code and reports

- In MATLAB, run the following command to add AMD Vivado 2023.1 to the system path:

  hdlsetuptoolpath('ToolName','Xilinx Vivado','ToolPath','C:\Xilinx\Vivado\2023.1\bin');
- Launch HDL Workflow Advisor for the DUT subsystem (HDL Code Toolstrip > Workflow Advisor).
- Configure the target settings as shown on the right in step 1. Click Apply at each step to save the changes.
- 4. Open **HDL Code Generation Settings...** in step 3.1.



2: Target Frequency (MHz): 200





### Step 4.2: Generate HDL code and reports

- 5. Change Reset type to Synchronous (under **HDL Code Generation** > **Global Settings**), then click **OK** to close the Configuration Parameters dialog.
- 6. Back in HDL Workflow Advisor, run through steps 1 to 3 (right-click on step 3.2 > Run to Selected Task).





Notes

gm pulse detector v

# Step 4.2: Generate HDL code and reports

- 7. Review the results of *Adaptive Pipelining* and *Delay Balancing* in the generated optimization report. Open the **generated model** using the provided link to visualize the added latency.
- Review resource usage in the high-level resource report.



Code Generation Report

0 Warnings, 2 Messages

High-level Resource Report

Code Interface Report
Timing And Area Report

Optimization - General

Hierarchy Flattening

Delay Balancing

Contents

Summary

Clock Summary

Match Case

Subsystem: Compute Power

**Block Name** 

Product

Product1

Adaptive Pipelining Report for pulse detector v4

Number of pipelines inserted

Note: The generated model is bit-true & cycle-accurate to the generated HDL. It is used to verify the generated code.



# Step 4.3: Synthesize the generated design

Run step 4.1 to create a Vivado project. You may open the project using the provided link and continue

synthesis in Vivado, especially if you expect it to take a long time.

Run step 4.2.1 to synthesize the design without launching Vivado.
 MATLAB is blocked while synthesis is run in the background.



| alysis                                                                                         |                                                      |                                                   |                               |
|------------------------------------------------------------------------------------------------|------------------------------------------------------|---------------------------------------------------|-------------------------------|
| n logic synthesis for                                                                          | specified                                            | d FPGA device                                     | 9                             |
| nput Parameters                                                                                |                                                      |                                                   |                               |
| Skip pre-route tim                                                                             | ning analy                                           | ysis                                              |                               |
| Run This Task                                                                                  |                                                      |                                                   |                               |
|                                                                                                |                                                      |                                                   |                               |
| sult: 🗸 Passed                                                                                 |                                                      |                                                   |                               |
| 10 4                                                                                           |                                                      |                                                   |                               |
| assed Synthesis                                                                                |                                                      |                                                   |                               |
|                                                                                                |                                                      |                                                   |                               |
| arsed resource re                                                                              | port file                                            | : Pulse_Dete                                      | ector utilization             |
| _                                                                                              |                                                      |                                                   |                               |
|                                                                                                |                                                      |                                                   |                               |
|                                                                                                | _                                                    |                                                   |                               |
| Resource                                                                                       | _                                                    |                                                   | Utilization (%)               |
| Resource                                                                                       | _                                                    | Available<br>171900                               | Utilization (%)               |
|                                                                                                | Usage                                                | 171900                                            | •                             |
| Resource<br>Slice LUTs<br>Slice Registers                                                      | Usage<br>407                                         | 171900                                            | 0.24                          |
| Resource<br>Slice LUTs<br>Slice Registers<br>DSPs                                              | Usage<br>407<br>4528<br>194                          | 171900<br>343800                                  | 0.24<br>1.32                  |
| Resource<br>Slice LUTs<br>Slice Registers<br>DSPs<br>Block RAM Tile                            | Usage<br>407<br>4528<br>194                          | 171900<br>343800<br>900                           | 0.24<br>1.32<br>21.56         |
| Resource<br>Slice LUTs<br>Slice Registers<br>DSPs<br>Block RAM Tile                            | Usage<br>407<br>4528<br>194<br>0                     | 171900<br>343800<br>900<br>500                    | 0.24<br>1.32<br>21.56         |
| Resource<br>Slice LUTs<br>Slice Registers<br>DSPs<br>Block RAM Tile<br>URAM                    | Usage<br>407<br>4528<br>194<br>0                     | 171900<br>343800<br>900<br>500                    | 0.24<br>1.32<br>21.56<br>0.00 |
| Slice Registers<br>DSPs<br>Block RAM Tile<br>URAM<br>Parsed timing repo                        | Usage<br>407<br>4528<br>194<br>0<br>0                | 171900<br>343800<br>900<br>500                    | 0.24<br>1.32<br>21.56<br>0.00 |
| Resource<br>Slice LUTs<br>Slice Registers<br>DSPs<br>Block RAM Tile                            | Usage<br>407<br>4528<br>194<br>0<br>0<br>ort file: t | 171900<br>343800<br>900<br>500                    | 0.24<br>1.32<br>21.56<br>0.00 |
| Resource Slice LUTs Slice Registers DSPs Block RAM Tile URAM Parsed timing repo                | Usage<br>407<br>4528<br>194<br>0<br>0                | 171900<br>343800<br>900<br>500                    | 0.24<br>1.32<br>21.56<br>0.00 |
| Resource Slice LUTs Slice Registers DSPs Block RAM Tile URAM Parsed timing repo                | Usage<br>407<br>4528<br>194<br>0<br>0<br>ort file: t | 171900<br>343800<br>900<br>500                    | 0.24<br>1.32<br>21.56<br>0.00 |
| Resource Slice LUTs Slice Registers DSPs Block RAM Tile URAM Parsed timing repo                | Usage<br>407<br>4528<br>194<br>0<br>0<br>ort file: t | 171900<br>343800<br>900<br>500<br>0<br>iming_post | 0.24<br>1.32<br>21.56<br>0.00 |
| Resource Slice LUTs Slice Registers DSPs Block RAM Tile URAM Parsed timing repo Timing summary | Usage<br>407<br>4528<br>194<br>0<br>0<br>ort file: t | 171900<br>343800<br>900<br>500<br>0<br>iming_post | 0.24<br>1.32<br>21.56<br>0.00 |

4.2.1. Run Synthesis



### Summary

This completes the tutorial. Compare your work to the solution models in the folder /pulse\_detector/solution.

#### For more information:

- Visit our <u>FPGA and SoC design website</u>
- Explore MathWorks Training on <u>HDL code generation</u> and <u>FPGA signal processing</u>
- Contact us on <u>File Exchange</u> for additional questions